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ABSTRACT 

Objectives To test the feasibility of using text mining 
to depict meaningfully the experience of pain in patients 
with metastatic prostate cancer, to identify novel pain 
phenotypes, and to propose methods for longitudinal 
visualization of pain status. 
Materials and methods Text from 4409 clinical 
encounters for 33 men enrolled in a 1 5-year longitudinal 
clinical/molecular autopsy study of metastatic prostate cancer 
(Project to Eliminate lethal CANcer) was subjected to natural 
language processing (NLP) using Unified iVIedical Language 
System-based terms. A four-tiered pain scale was developed, 
and logistic regression analysis identified factors that 
correlated witli experience of severe pain during each month. 
Results NLP identified 6387 pain and 13 827 drug 
mentions in the text. Graphical displays revealed the pain 
'landscape' described in the textual records and confirmed 
dramatically increasing levels of pain in the last years of life 
in all but two patients, all of whom died from metastatic 
cancer. Severe pain was associated with receipt of opioids 
(0R=6.6, p<0.0001) and palliative radiation (OR=3.4, 
p=0.0002). Surprisingly, no severe or controlled pain was 
detected in two of 33 subjects' clinical records. Additionally, 
the NLP algorithm proved generalizable in an evaluation 
using a separate data source (889 Informatics for Integrating 
Biology and the Bedside (i2b2) discharge summaries). 
Discussion Patterns in the pain experience, undetectable 
without the use of NLP to mine the longitudinal clinical 
record, were consistent with clinical expectations, suggesting 
that meaningful NLP-based pain status monitoring is 
feasible. Findings in this initial cohort suggest that 'outlier' 
pain phenotypes useful for probing the molecular basis of 
cancer pain may exist. 

Limitations The results are limited by a small cohort size 
and use of proprietary NLP software. 
Conclusions We have established the feasibility of 
tracking longitudinal patterns of pain by text mining of free 
text clinical records. These methods may be useful for 
monitoring pain management and identifying novel cancer 
phenotypes. 



INTRODUCTION 

Pain is a debilitating part of the experience of meta- 
static cancer. An automated system to categorize and 
track pain in electronic medical records could provide 
a powerful means to improve clinical care, and could 
allows novel 'high pain' or 'low^ pain' phenotypes to 
be defined and studied on a molecular basis. We 
tested the feasibility of using natural language pro- 
cessing (NLP) of text from clinical encounters to 
depict meaningfully the experience of pain in patients 
w^ith metastatic prostate cancer over time. 



BACKGROUND 

Worldw^ide, prostate cancer is the second most com- 
monly diagnosed cancer and the sixth leading cause 
of cancer death in men.^ In the past decade, signifi- 
cant effort has been made to better understand and 
reduce the burden of pain on the cancer patient, 
the patient's family, caregivers, and society.^ Pain 
status can predict survival in metastatic prostate 
cancer,^ and changes in pain status have been exam- 
ined as a surrogate marker of effectiveness of new 
therapies.^ ^ Several validated pain survey tools have 
been proposed for routine clinical care.^^ 

NLP has been used to quantify associations 
between diseases, conditions, and symptoms, 
for vocabulary discovery, and for cohort con- 
struction. NLP applications focusing on pain 
in clinical records have successfully detected the 
experience of pain in free text w^ithin an electronic 
medical record.^^"^^ Some studies suggest that, in 
some scenarios, NLP of medical record text may 
perform better than patient-completed surveys in 
detection of clinically relevant pain.^^ 

Although pain has previously been normalized 
and classified manually for purposes of statistical 
correlations,^^ w^e used NLP to automatically char- 
acterize the experience of pain over thousands of 
records. To our know^ledge, this is the first study to 
combine NLI? date resolution, and statistical analysis 
to create a longitudinal study of pain in the clinical 
record. Our system normalized each mention of 
pain in longitudinal clinical records by severity clas- 
sification and number of days before death. We used 
regression modeling techniques to analyze both the 
new^ly structured data and the existing structured 
data to search for phenotypic correlations w^ith pain 
in the context of metastatic prostate cancer. 

Pain management is fundamental to effective 
clinical care, and significant pain is a consequence 
of the disordered biology of many cancers. This 
study tests the feasibility of automatically tracking 
patient pain over time using NLP of clinical record 
text. If NLP-based pain tracking is feasible, further 
study be indicated to test the hypothesis that 
adoption of NLP-based pain tracking w^ithin elec- 
tronic health record systems could provide signifi- 
cant added value in clinical care and in advancing 
research in disease phenotyping. 

METHODS 
Patient cohort 

Thirty-three men from the PELICAN (Project to 
ELIminate lethal CANcer) integrated clinical/molecu- 
lar autopsy study of metastatic prostate cancer w^ere 
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the subjects of this study. Subjects joined the institutional review 
board-approved study betw^een 1995 and 2005. The mean age of 
the study subjects at the time of diagnosis of prostate cancer was 
62 years (range 42-75). The mean interval betw^een diagnosis and 
death was 6.3 years (range 0.8-15.4). Of the 33 subjects in the 
study, 27 w^ere Caucasian, five were African-American, and one 
w^as of Hispanic background. Six subjects were seen only in com- 
munity hospital inpatient, clinic and private office settings; the 
remaining 27 subjects w^ere foUow^ed in a combination of oncology 
center and community hospital clinic settings. 

Clinical records obtained 

The study obtained and analyzed all available paper, electronic, 
radiologic, radiation therapy, and pathology medical records for 
each subject. Subjects provided a list of institutions and phys- 
ician offices w^here medical care w^as received, and copies of 
medical records from the various institutions and offices were 
obtained. 

Creation of electronic records for each study subject 

A total of 23 887 pages of paper records were converted into 
electronic text using methods described in the online appendix. 
The electronic record included laboratory values, radiology 
reports, pathology reports, and records of inpatient and out- 
patient encounters w^ith providers. The text recorded in 4409 
inpatient and outpatient encounters is called the 'PELICAN 
corpus' and is the focus of this study. 

Integrated Life Sciences Research (ILSR) database 
and removal of identifiers 

The full curated electronic text of each paper record w^as placed 
in the ILSR database, a system created to support the PELICAN 
Study. The average number of inpatient or outpatient records 
per year betw^een diagnosis and death was 32 (range 4-212). 
Subject date of birth, date of death, race/ethnicity, all available 
serum prostate-specific antigen (PSA) concentrations, body 
w^eight measurements, body height measurements, and radiation 
therapy records w^ere separately tabulated in ILSR by project 
data curators. 

Pain status categorization 

A multidisciplinary team consisting of NLP softw^are developers, 
medical subject matter experts (SMEs), and statisticians devel- 
oped a pain categorization model based on a conservative four- 
tiered pain scale: no pain (category 0); some pain (category 1); 
controlled pain (category 2); severe pain (category 3). 

Natural language processing 

We used ClinREAD, a proprietary healthcare-domain-oriented, 
rule-based NLP system (Lockheed Martin, Bethesda, Maryland, 



USA) built on AeroText (Rocket Software, New^ton, Minnesota, 
USA) and previously successfully used by members of the study 
team in the Informatics for Integrating Biology and the Bedside 
(i2b2) obesity challenge.^^ CHnREAD was chosen because of 
its availability to the project team and team familiarity w^ith its 
use. Other valid approaches, including machine learning, w^ere 
not used because of lack of available resources for the current 
project. The first stage of the current project involved iterative 
development and evaluation of NLP-based pain extraction and 
qualification (severity, anatomy, and date) in the 4409-record 
PELICAN corpus, for the purposes of discovery over a closed 
dataset. During this stage, w^e made iterative modifications to 
our entire system, data model, normalization rules, and vocabu- 
lary (details in online appendix). We tested the generalizability 
of the NLP methods on 889 unannotated, deidentified discharge 
summaries provided courtesy of i2b2.^^ 

The system rated each mention of experienced or explicitly 
denied pain on the basis of the context in w^hich it w^as found 
(table 1). We developed 42 pain severity contextual rules, such 
as (complete Hst in onHne appendix table 2): 
[pain severity modifier] [body location] [pain term] 
[pain term] [to be] [adv] [PainSeverityComplement] 
[pain term] ... [pain severity complement] out of [10 1 ten] 
Vocabulary from the Unified Medical Language System (UMLS; 
version 201 OAB)^^ w^as imported via the Metathesaurus from 35 
level 0 source vocabularies (see online appendix table 3). We 
selected 16 semantic types based on the domain of the data as 
show^n in online appendix table 4. Lookup tables w^ere created 
from each set of synonymous terms in order to associate each 
phrase w^ith a preferred term and a UMLS concept unique identi- 
fier (CUI). A filtering process similar to that of Roberts et aP^ was 
used to remove irrelevant terms. After filtering, a total of 675 000 
terms and phrases w^ere contained in the study vocabulary. 

We combined the vocabulary terms w^ith context patterns in 
order to recognize internal dates, negatives, conditionals, and 
pain severity. These context patterns w^ere developed manually. 
ClinREAD, like MedLEE,^^ is rule-based. Each clinical concept 
('sign or symptom', 'finding', 'injury or poisoning', 'disease or 
syndrome', or 'neoplastic process') is associated w^ith a date and a 
body location; see online appendix for further detail. The system 
resolved incomplete dates (eg, 'in July') based on the date of the 
encounter, and resolved relative dates (eg, 'four days prior to 
admission') based on the previous date mention. Each resolved 
date is represented as a range (startdate, enddate). This date reso- 
lution component w^as based on the development team's previous 
wor^"^'^^ and is described in the onHne appendix. When dates 
were missing, the date of the clinical encounter was used as the 
default. Date associations w^ere used to normalize the clinical 
concept to the number of days before death, for each individual 
study subject. This calculation is enabled through the conversion 

I 



Table 1 Example pain severity indicators 





No pain (category 0) 


Some pain (category 1) 


Controlled pain (category 2) 


Severe pain (category 3) 


Example modifiers (occurring before pain mention) 


No 


Some 


Controlled 


Severe 




Without 


Mild 


Controlling 


Significant 




No complaint 


Intermittent 


Treatment for 


Crushing 




Denied any 


Occasional 


Essentially controlled 


Excruciating 




Absence of 


Negligible 


To control this 


Exquisite 


Example complements (occurring after pain mention) 


Relieved 


Dull 


Controlled 


Intractable 




Resolved 


Not too bad 


Managed 


[8-10] 




0 


[1-7] 


Well managed 


[Eight-ten] 




Zero 


[One-seven] 
Persistent 




Unbearable 
Uncontrollable 
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of the midpoint of absolute date ranges to the modified Juhan 
format.^ ^ Each mention of pain was associated with a severity 
level from the four-tiered pain scale. A subset of 637 strings from 
semantic type 'sign or symptom' were identified as indicating 
pain, listed in online appendix table 5. The NLP algorithm used 
for the study is summarized in figure 1 and as follows. 

1. Generic processing 

A. Text tokenization and sentence detection. 

B. Find mentions of dates and numbers; identify the date 
of the encounter to be used in date resolution. 

C. Resolve dates in document order, and calculate Julian 
format of the midpoint of each date range. 

2. Find clinical data (concept extraction) 

A. Find mentions of body locations using UMLS vocabu- 
lary; look up preferred term. 

B. Find mentions of clinical concepts using UMLS vocabu- 
lary; look up preferred term. 

C. Disambiguate overlapping UMLS vocabulary based on 
confidence scores associated with each contextual rule. 
Semantic type disambiguation of overloaded terms was 
rudimentary; we used context in some cases but relied 
heavily on default assumptions for terms commonly 
used in the context of prostate cancer (especially for 
abbreviations). For the purposes of discovery over a 
closed corpus, the current disambiguation was effective, 
as shown by our evaluated performance. 

3. Find context information 

A. Find context of UMLS vocabulary to identify negations, 
conditionals, and hypothetical, and to associate pain 
severity level, body locations and explicit dates. Context 
rules were manually built. The negation context rules 
used much of the same vocabulary as the NegEx algo- 
rithm.^^ Additional rules to distinguish family history, 
conditionals, and hypothetical were manually developed 
from examination of the dataset. 

B. Find negated Hst sentences (The patient denies nausea, 
headache, fevers, or chills.') and negate all concept 



mentions within them. These predictable sentences were 
common in the dataset. 
C. Convert negated instances of pain concepts to severity 
scale 0. 

4. Clean cHnical concept data 

A. Associate dates and body locations with UMLS clinical 
concepts. Use the date of the encounter when an explicit 
date for the event cannot be determined. 

B. Assign CUIs based on vocabulary, severity, semantic type, 
and body location. 

C. Update structured data concepts that lack severity or 
body locations where the CUI impHes them ('severe 
headache', C0239889). 

D. Delete non-pain negatives, conditionals, and hypotheti- 
cal from the structured dataset in order to create a set 
of actual, experienced clinical concepts. 

E. Calculate 'days before death' for each clinical concept, 
using the associated Julian formatted date and a lookup 
table listing the Julian formatted date of death for each 
subject in the PELICAN cohort. 

Although our system is proprietary, it could be replicated 
using other tools. One might start with any system that extracts 
concepts and identifies assertions as defined for the 2010 i2b2/ 
YA challenge on concepts, assertions, and relations in clinical 
text.^^ One could then integrate a temporal reference extraction 
and normalization tool such as HeidelTime,^^ GATE with the 
Tagger_DateNormalizer plugin,^^ or DANTE,^^ filter out the 
pain-related concepts, as listed in appendix table 5, and identify 
the level of pain using rules defined in appendix table 2 and the 
lookup table defined in appendix table 11. 

Study database 

NLP processing of the records database produced structured 
data on each pain mention in each clinical record for all 33 
study subjects. These data were combined with demographic 
and other separately curated data about each subject into a 
single study database suitable for statistical analysis. 



Free Text 



Clinical 
Records 



"Mr A29 has adenocarcinoma of the prostate status-post definitive radiotherapy." 
"Patient noted mild nausea today which is innproved with PC compazine." 
"He was seen two weeks ago and continues to have significant pain" 



Generic Processing 



Find Clinical Data 



Find Context Info 



Clean up UMLS based "Atoms" 



Tokenize 
Record, 
Create 

Sentences 



Find 
Numbers 
and 
Dates 



Normalize 
Dates to 
Intervals 



Find Body 
Location 
using 
UMLS 
vocab; 
Assign 
Preferred 
Term 



Find 
Clinical 
Data using 
UMLS 
Vocab; 
Assign 
Preferred 
Term 



♦ * 

Find 
Context Info 
(Dates, 
Severity, 

Body 
Locations, 
Negations, 
conditionals 



Find 
"absence 

of 
Sentences. 
Negate all 
mentions 



Change 
negated 

pain 
mentions 
to "no 
pain" 
severity 



Calculate 
Days 
Before 
Death for 
Clinical 
Data 
And 
delete 
negatives 



Assign 
UMLS CUI 
from body 
location 
and Pain 
severity in 
text 



Assign 
body 
location 
and Pain 
Severity 
from CUI 
(if blank) 



* Pattern Matching ♦ See online appendix figure 1 



ADENOCARCINOMAS 

CUI: C0001418 

Location: PROSTATIC 

STRUCTURE 

Days before Death: 668 



Clinical Concept Structured Data 



NAUSEA 
CUI: C0027497 
Severity: MILD 
Days before Death: 579 



PAIN 

CUI: C0030193 
Severity: SEVERE 
Severity Scale: 3 
Days before Death: 578 



Figure 1 Natural language processing algorithm. CUI, concept unique identifier; UMLS, Unified Medical Language System. 



900 



Heintzelman NH, etal.JAm Med Inform Assoc 2013;20:898-905. doi:1 0.1 136/amiajnl-201 2-001 076 



Research and applications 



Identification of correlates of severe pain 

We undertook a univariate logistic regression analysis to identify 
correlates of severe pain for use in a multivariate model; factors 
investigated included receipt of various drugs (eg, opioids, 
chemotherapy, steroids), body mass index (BMI), receipt of pal- 
liative radiation, and frequency of utilization of health services — 
that is, we correlated severe pain, as derived by NLP processing, 
M^ith clinical and demographic factors from the structured (ie, 
non-NLP-based, pre-existing) portion of the study database. For 
this analysis, 'severe pain' w^as any reading of controlled or severe 
pain — that is, any reading of 2 or 3 during a month of observa- 
tion versus any other reading (-l=no data, 0 = explicit report of 
no pain, 1= reported pain not described as controlled or severe); 
see online appendix for further details. We then constructed a 
multiple regression model to assess the strength of associations 
betw^een the occurrence of severe pain and all defined variables 
for w^hich p w^as less than 0.1 in the univariate analysis. Inclusion 
of a dichotomous variable indicating 'last year before death' con- 
trolled for time effects.^^ All statistical analyses w^ere conducted 
M^ith SAS V9.2 using the Proc Logistic procedure. 

Visualization of patients' experience of pain 

We determined a pain index value for each subject during four 
intervals before death, w^ith pain index defined as the mean 
monthly maximum pain value (max_pain) for months in w^hich 
a pain report w^as available; the monthly max_pain values used 
w^ere no pain = 0, some pain=l, and controlled pain or severe 
pain =2 (see figure 3). We then obtained longitudinal view^s of 
pain status in each subject by plotting color-coded monthly 
max_pain values from diagnosis until death (figure 2). When no 
pain status report w^as available for a given month, w^e used the 
most recent pain status as the imputed value for a given subject. 

To test the possibility of visualizing a summary of pain records 
from a group of subjects, we displayed the fraction of study subjects 
in each pain severity up to the time of death (see online appendix 
figure 2A), as well as the w^orst pain severity detected for each 
subject for each month up to death (see online appendix figure 2b). 

RESULTS 

The purpose of the project was discovery over a closed dataset and 
a study of feasibility. To evaluate and improve the performance of 
the NLP algorithm on the dataset, w^e completed multiple rounds 
of SME evaluation (GSB and RJT). Across all patient encounters, 
the NLP algorithm identified 6387 pain mentions (mean 1.45 pain 
mentions per record) and 13 827 drug mentions. 

Evaluation of NLP method within PELICAN clinical 
text records 

After development, we evaluated performance on the closed 
PELICAN corpus using the AeroText Answ^er Key Editor. The 



SMEs separately corrected 32 automatically annotated full text 
clinical encounter records randomly selected from the entire 
study set to create 'answ^er keys'. These 32 records contained 207 
mentions of pain. The NLP developers had no influence on the 
correction of the annotations. Inter- annotator agreement on pain 
mention (exact token match in the text and normalized concept 
name), pain start and end date (exact match), body location of 
pain (exact match), and pain severity integer are show^n in table 
2A, B. We assessed inter-annotator agreement by scoring one 
annotation set against the other. The entire team then met to 
discuss and adjudicate the two sets of corrections. Pain mentions 
on M^hich there were disagreements were resolved to form the 
gold standard answ^er key; see online appendix table 6 for exam- 
ples. We then assessed system performance compared against the 
gold standard answ^er key, requiring correct answ^ers (region of 
text, normalized concept name, body location, and pain severity 
integer) to be exact matches. Recall is the percentage of pain men- 
tions in the record that w^ere correctly identified by the NLP 
system. Precision is the percentage of pain mentions identified by 
the NLP system that are correct. E-measure is the harmonic mean 
of precision and recall, and provides a measure of overall accuracy. 
E-measure for pain mention detection was 0.95, and for overall 
average pain severity assignment was 0.81 (see also table 3). 

Evaluation of NLP methods within i2b2 discharge 
summaries 

We further evaluated the generalizability of our NLP methods 
using a bHnd test set from 889 unannotated, deidentified dis- 
charge summaries from i2b2.^^ Detailed methods are provided 
in the online appendix. A test set of 30 discharge summaries 
(containing 111 pain mentions) w^as chosen and kept unknow^n 
to the NLP developers at all times during the evaluation 
process. The remaining i2b2 records were designated the 'devel- 
opment set.' Ground truth w^as created using the same process 
as the PELICAN evaluation, with the added control that the 
annotation process w^as supervised by a developer not involved 
in the project. Inter-annotator agreement on the i2b2 corpus is 
show^n in table 2C, D. The SMEs adjudicated each disagreement 
to obtain an approved gold standard. Several differences w^ere 
noted by the SMEs between the i2b2 and the PELICAN clinical 
record corpora, including an increased frequency of ambiguously 
dated pain mentions in the i2b2 corpus, as show^n by the low^ 
inter-annotator agreement for start and end dates in table 2C. 
Eurther discussion can be found in the online appendix, as can 
the adjudicated annotations for our 30-report i2b2 test set. 

The NLP system was run, as built, on the blind i2b2 test set 
and scored against the approved gold standard using the 
AeroText scoring tool. The initial extraction E-measure for pain 
mentions in the new^ test set w^as 0.87; see appendix for 



Table 2 Inter-annotator agreement on (A) pain mention, pain start and end date, and body location on the PELICAN corpus, (B) pain severity 
on the PELICAN corpus, (C) pain mention, pain start and end date, and body location on the i2b2 corpus, and (D) pain severity on the i2b2 
corpus 

A— PELICAN corpus B— PELICAN corpus C— i2b2 corpus D— i2b2 corpus 

Agreement Agreement Agreement Agreement 



Pain mention 0.97 No pain 0.93 

Start date of pain 0.93 Some pain 0.91 

End date of pain 0.91 Controlled pain 0.79 

Body location of pain 0.76 Severe pain 0.85 

Severity of pain overall average 0.88 



Pain mention 0.88 No pain 0.81 

Start date of pain 0.79 Some pain 0.86 

End date of pain 0.74 Controlled pain 1.00 

Body location of pain 0.71 Severe pain 0.67 

Severity of pain overall average 0.85 



i2b2, Informatics for Integrating Biology and the Bedside; PELICAN, Project to Eliminate lethal CANcer. 
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Table 3 Accuracy of natural language processing algorithm extraction of pain mentions regarding (A) pain mention, pain start and end date, 
and body location on the PELICAN corpus, (B) pain severity on the PELICAN corpus, (C) post-development pain mention, pain start and end date, 
and body location on the i2b2 corpus, (D) post-development pain severity on the i2b2 corpus 

A— PELICAN corpus B— PELICAN corpus 

TP Inc FN FP Recall Precision F-Measure TP Inc FN FP Recall Precision F-Measure 



Pain mention 


153 


0 7 


9 


0.96 


0.94 


0.95 


Explicitly no pain 


19 


3 


2 


0 


0.79 


0.86 


0.83 


Start date of pain 


145 


8 7 


9 


0.91 


0.90 


0.90 


Some pain 


75 


18 


1 


8 


0.80 


0.74 


0.77 


End date of pain 


145 


8 7 


9 


0.91 


0.90 


0.90 


Controlled pain 


17 


1 


2 


0 


0.85 


0.94 


0.90 


Body location of pain 


51 


1 23 


3 


0.68 


0.93 


0.80 


Severe pain 


20 


0 


2 


1 


0.91 


0.95 


0.93 
















Severity of pain overall average 


131 


22 


7 


9 


0.82 


0.81 


0.81 


C — i2b2 corpus 














D — i2b2 corpus 


















TP 


Inc FN 


FP 


Recall 


Precision 


F-Measure 




TP 


Inc 


FN 


FP 


Recall 


Precision 


F-Measure 



Pain mention 


105 


0 6 


17 


0.95 


0.86 


0.90 


Explicitly no pain 


18 


1 


0 


3 


0.95 


0.82 


0.88 


start date of pain 


74 


31 6 


17 


0.67 


0.61 


0.64 


Some pain 


67 


10 


2 


14 


0.85 


0.74 


0.79 


End date of pain 


73 


32 6 


17 


0.66 


0.60 


0.63 


Controlled pain 


5 


0 


3 


0 


0.63 


1.00 


0.81 


Body location of pain 


64 


0 42 


10 


0.60 


0.86 


0.73 


Severe pain 


4 


0 


1 


0 


0.80 


1.00 


0.90 
















Severity of pain overall average 


94 


11 


6 


17 


0.85 


0.77 


0.81 



FN, false negative; FP, false positive; i2b2. Informatics for Integrating Biology and the Bedside; Inc, incorrect; PELICAN, Project to Eliminate lethal CANcer; TP, true positive. 



complete scores. A 10-hour development process was then con- 
ducted to adjust for stylistic differences in the new corpus. The 
system was scored again, and the system F-measures on pain 
mentions and pain severity increased to 0.90 and 0.81, 
respectively. 

Date association accuracy was significantly lower than for the 
PELICAN corpus, falling for start date from 0.90 for PELICAN 
to only 0.64 for i2b2. We beHeve that this was the result of a 
larger number of ambiguous date references in the i2b2 corpus 
and differences in the annotation guides used by the SMEs to 
annotate the two corpora; see online appendix for further 
discussion. 

Post-development measures of the NLP extraction over the 
i2b2 corpus are given in table 3; the final NLP extraction of 
pain in the i2b2 test set is given in the online appendix. 
Developers remained blind to the test set throughout the devel- 
opment process. The bHnd evaluation on an independent 



dataset showed that, with lOh of development time to adjust 
for corpus stylistic differences, the NLP system developed for 
this project is generalizable beyond the PELICAN corpus. 

Pain phenotype exploration 

Overall, pain increased markedly during the last 2 years of life 
(figure 2). Metastatic prostate cancer was the listed cause of 
death in all study cases, and none of the subjects was found to 
have significant additional contributing causes of death. In the 
final year of Hfe, subject pain index varied widely, from 0.3 to 
1.6, with a roughly equal distribution of subjects across this spec- 
trum. The five African-American study subjects clustered at the 
high end of the pain index spectrum (range 1.3-1.6) (table 4). 

The system detected no severe or controlled pain in two sub- 
jects (8 and 30). The number of clinical encounter records avail- 
able per year between diagnosis and death for these two 
subjects was 32 and 19, indicating that the lack of severe pain 



Figure 2 Study subject pain ribbons 
'algograph' display. Maximum pain 
reported by each subject from prostate 
cancer diagnosis until time of death 
from prostate cancer. Pain data are 
reported for each month. Grey, no 
report; green, no pain; yellow, pain; 
orange, controlled pain; red, severe 
pain. 
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Figure 3 Pain index by subject in 
last months of life. Five African- 
American subjects have asterisks 
added to their study subject numbers. 




reports in these two subjects was not due to a lack of clinical 
encounters. We found no evidence that these subjects died 
earher in the course of their disease from non-cancer causes. 
Since bone pain is the major source of pain in men with meta- 
static prostate cancer, we reviewed bone scan findings in these 
two subjects, and both demonstrated widespread bone changes 
consistent with metastatic prostate cancer, similar to scan results 
from all other study patients. 

Correlates of severe pain 

In the initial univariate analysis, all considered variables except 
for receipt of definitive radiation and maximum recorded BMI 
correlated significantly with severe pain. African-American eth- 
nicity was borderline associated with severe pain (OR 1.5, 
p = 0.09). Receipt of opiates (OR 25.6, p<0.001), palliative radi- 
ation (OR 13.8, p<0.0001), and being in the last year of life 
(OR 9.9, p< 0.001) were strongly associated with severe pain. 
See online appendix for detailed univariate analysis results. 

In the multivariate analysis, only five of the 12 remaining 
factors were significantly associated with severe pain (p<0.1): 
receipt of palliative radiation, opioids, or chemotherapy; being 
in the last year of life; and the number of outpatient visits (table 4). 
Receipt of non-steroidal anti-inflammatory drugs (NSAIDs), cor- 
ticosteroids or sex-steroid-manipulating drugs were not signifi- 
cantly associated. These findings are consistent with current 
clinical practice, where palliative radiation^"^ and opioids^^ 
are treatments typically reserved for severe pain, and NSAIDs, 
corticosteroids, and sex-steroid drugs are used more generally 
across the pain spectrum.^^ Similarly, the last year of life is clin- 
ically known to be when severe pain is most common^^ and 



when clinical encounters are most frequent. The multivariate 
model found no significant association with increasing serum 
PSA concentration, age at diagnosis, or decline in BMI to 
<90% maximum after controlling for the effects of time. There 
was a non-significant trend associating African-American ethni- 
city with more severe pain. 

The model and findings were robust, explaining 83% of the 
variability in the data. When we excluded six patients who were 
seen only in a community setting and who had fewer recorded 
clinical encounters, the patterns of association remained 
unchanged. Moreover, when we removed from the model all 
variables that were not significant in the univariate analysis, the 
strengths of the associations (adjusted ORs) of the remaining 
variables and p values changed only marginally. 

DISCUSSION 

In multivariate regression analysis, pain status detected by NLP 
correlated statistically with parameters clinically known to be 
associated with increased pain. Conversely, pain status detected 
by NLP was not associated with parameters not expected to be 
clinically associated with pain status, such as administration of 
definitive radiation with curative intent. These results suggest 
that meaningful NLP-based pain status monitoring is feasible. 
While this project used a rule-based NLP system, machine- 
learning-based NLP tools should be tested in future work. 

Text in longitudinal data is valuable for the study of symptoms 
such as pain, where the clinical unstructured description may be 
more complete than it is in structured data.^^ NLP techniques 
convert such unstructured data into structured data, which is typ- 
ically more amenable to rigorous analysis and display. 
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Table 4 Multivariate regression analysis of natural language 
processing-detected pain and clinical variables detects clinically 
expected associations between pain status and administration of 
palliative radiation and opioid drugs, as well as months before 
death, and number of inpatient and outpatient visits 



Variable 


OR point 
estimate 


95% CI 


p Value 


Received palliative radiation 


3.61 


1 .90 to 6.84 


<0.0001 


Log PSA concentration 


1.09 


0.86 to 1 .38 


0.492 


Age at diagnosis 


1.00 


1 .00 to 1 .00 


0.655 


African-American 


1.56 


0.80 to 3.05 


0.190 


Subject BMI <90% of 


1.02 


0.62 to 1.67 


0.935 


maximum 








Chemotherapy administered 


1.75 


1.01 to 3.04 


0.046 


NSAID administered 


1.34 


0.78 to 2.29 


0.284 


Opioid drug administered 


6.91 


4.07 to 


<0.0001 






11.74 




steroid drug administered 


1.21 


0.70 to 2.06 


0.497 


In last year of life 


2.52 


1.46 to 4.36 


0.001 


Number outpatient visits 


1.29 


1.12 to 1.48 


0.001 


Number inpatient visits 


1.30 


0.75 to 2.26 


0.350 



BMI, body mass index; NSAID, non-steroidal anti-inflammatory drug; PSA, 
prostate-specific antigen. 



Relief of pain is essential in the management of many acute 
and chronic diseases, and convenient automated monitoring of 
patient pain status could provide a valuable new tool for 
improving quality of life and care. Real-time, easy-to-interpret 
view^s of the pain status history of an individual patient or a 
group of patients, as show^n in figure 2 and online appendix 
figure 2, could allows busy clinicians to identify patients most in 
need of increased pain management intensity, and allows 
researchers to perform visual and quantitative comparison of 
groups of subjects participating in clinical trials of novel therap- 
ies or novel clinical interventions. 

NLP-based determination of pain status may help to identify 
clinically significant molecular differences betw^een prostate 
cancers. For example, a study of the molecular differences in 
the cancers of the two men who apparently experienced no 
severe pain could provide important clues to the biological 
determinants of severe pain in metastatic prostate cancer. 
Similarly, the trend tow^ard increased pain experienced by the 
five African-Americans compared w^ith the other men in the 
study is consistent w^ith an oncology clinical trial w^hich found 
that African-American men were more likely than w^hite men to 
have extensive disease and bone pain.^'^ 

Limitations and future directions 

Our study has several limitations. First, the dataset was relatively 
small, covering just 33 patients. Second, it was difficult to distin- 
guish pain mentions that were not related to the subjects' meta- 
static prostate cancer. Although SME reviev^ of the records 
revealed only rare examples of pain not related to prostate 
cancer in the current study, future studies should implement 
formal methods to identify and link pain to a relevant disease 
source. Third, it was difficult to distinguish pain control status 
from the patient's current experience of pain. This study 
defaulted to 'controlled' pain as one of the pain categories 
because there w^ere multiple records where the patient w^as 
noted to be taking opioids for pain, but no current pain level 
was provided. Fourth, we may have slightly biased our 



annotation of the PELICAN corpus by using system outputs to 
initialize annotations. This technique has been show^n to 
improve consistency, reduce annotation time,^^ and improve 
inter-annotator agreement.^^ We minimized possible bias by 
having the annotators w^ork independently and by submitting 
the results to team scrutiny and collaborative discussion. In 
essence, the answ^er key w^as generated by compiling the answ^ers 
of four (overlapping) SMEs: two humans, the system itself, and 
the team as a whole. The similar evaluation results obtained on 
the separate i2b2 corpus, w^hich used isolated test and develop- 
ment sets, suggest that any bias was minimal. Finally, the use of 
proprietary ClinREAD and AeroText NLP softw^are may limit 
reproducibility. How^ever, this limitation is at least partially miti- 
gated by our provision of detailed rules, as w^ell as results from 
our analysis of the i2b2 corpus. Investigators interested in 
further analysis of the study dataset using other methods and 
under appropriate confidentiality protection are invited to 
contact the senior author. 

CONCLUSIONS 

Electronic health records have greatly facilitated detection and 
understanding of disease phenotypes and their relationship w^ith 
genetic and non-genetic factors.^ The study reported here, 
M^hich w^e believe to be the first to use NLP to obtain longitu- 
dinal pain status information in a cohort of patients, show^s that 
NLP-based monitoring of patient pain status is feasible and gen- 
eralizable to new^ datasets, and provides a number of 
phenotype-oriented observations useful for guiding future 
research. Future studies should focus on comparison of 
pain-status tracking by NLP versus other validated pain survey 
tools, and on practical integration of the tw^o methods in set- 
tings w^here electronic health records are in routine use. 
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